Transferring Sentiment Knowledge between Words and Tweets
نویسندگان
چکیده
Message-level and word-level polarity classification are two popular tasks in Twitter sentiment analysis. They have been commonly addressed by training supervised models from labelled data. The main limitation of these models is the high cost of data annotation. Transferring existing labels from a related problem domain is one possible solution for this problem. In this paper, we study how to transfer sentiment labels from the word domain to the tweet domain and vice versa by making their corresponding instances compatible. We model instances of these two domains as the aggregation of instances from the other (i.e., tweets are treated as collections of the words they contain and words are treated as collections of the tweets in which they occur) and perform aggregation by averaging the corresponding constituents. We study two different setups for averaging tweet and word vectors: 1) representing tweets by standard NLP features such as unigrams and part-of-speech tags and words by averaging the vectors of the tweets in which they occur, and 2) representing words using skip-gram embeddings and tweets as the average embedding vector of their words. A consequence of our approach is that instances of both domains reside in the same feature space. Thus, a sentiment classifier trained on labelled data from one domain can be used to classify instances from the other one. We evaluate this approach in two transfer learning tasks: 1) sentiment classification of tweets by applying a word-level sentiment classifier, and 2) induction of a polarity lexicon by applying a tweet-level polarity classifier. Our results show that the proposed model can successfully classify words and tweets after transfer.
منابع مشابه
MHSubLex: Using Metaheuristic Methods for Subjectivity Classification of Microblogs
In Web 2.0, people are free to share their experiences, views, and opinions. One of the problems that arises in web 2.0 is the sentiment analysis of texts produced by users in outlets such as Twitter. One of main the tasks of sentiment analysis is subjectivity classification. Our aim is to classify the subjectivity of Tweets. To this end, we create subjectivity lexicons in which the words into ...
متن کاملBuilding a robust sentiment lexicon with (almost) no resource
Creating sentiment polarity lexicons is labor intensive. Automatically translating them from resourceful languages requires in-domain machine translation systems, which rely on large quantities of bi-texts. In this paper, we propose to replace machine translation by transferring words from the lexicon through word embeddings aligned across languages with a simple linear transform. The approach ...
متن کامل2016 Olympic Games on Twitter: Sentiment Analysis of Sports Fans Tweets using Big Data Framework
Big data analytics is one of the most important subjects in computer science. Today, due to the increasing expansion of Web technology, a large amount of data is available to researchers. Extracting information from these data is one of the requirements for many organizations and business centers. In recent years, the massive amount of Twitter's social networking data has become a platform for ...
متن کاملAnnotate-Sample-Average (ASA): A New Distant Supervision Approach for Twitter Sentiment Analysis
The classification of tweets into polarity classes is a popular task in sentiment analysis. State-of-the-art solutions to this problem are based on supervised machine learning models trained from manually annotated examples. A drawback of these approaches is the high cost involved in data annotation. Two freely available resources that can be exploited to solve the problem are: 1) large amounts...
متن کاملLSIS at SemEval-2017 Task 4: Using Adapted Sentiment Similarity Seed Words For English and Arabic Tweet Polarity Classification
We present, in this paper, our contribution in SemEval2017 task 4 : ”Sentiment Analysis in Twitter”, subtask A: ”Message Polarity Classification”, for English and Arabic languages. Our system is based on a list of sentiment seed words adapted for tweets. The sentiment relations between seed words and other terms are captured by cosine similarity between the word embedding representations (word2...
متن کامل